Multifractal analysis of perceptron learning with errors 
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Random input patterns induce a partition of the coupling space of a perceptron into cells la- 
beled by their output sequences. Learning some data with a maximal error rate leads to clusters 
of neighboring cells. By analyzing the internal structure of these clusters with the formalism of 
multifractals, we can handle different storage and generalization tasks for lazy students and absent- 
^ ■ minded teachers within one unified approach. The results also allow some conclusions on the spatial 

distribution of cells. 



PACS numbers: 87.10.+e, 02.50.Cw 



^ \ I. INTRODUCTION 

, ^ ' Artificial neural networks show considerable information processing capabilities, see e.g. Q. One of the most 
important tasks is classification of data according to an initially unknown rule. Considering a set oi p = input 
patterns ^'^ G IR^ , ^ = l,...,p, there are 2^ possible binary functions i-^ ct^ ~ ±1. Some of them are linearly 
2 ' separable and can be realized by a simple perceptron 



sgn(J • - sgn(^ M^) (1) 



where J £ is called the coupling vector. Due to the scaling invariance of (|l|) this vector can be restricted by the 
spherical constraint 3 ■ 3 — N . The direction of J fixes the actual form of the classification. 

Not all coupling vectors define different functions of the p input patterns. According to their possible output 
^ ' sequences cr = {cr''|/i = 1, ■■■,p} we can group them together into at most 2^ cells 

C{a) - {J| - sgn(J • e) V/i} • (2) 



; These cells form a partition of the coupling space whose structure contains important information on the performance 
of the perceptron in various supervised learning problems. 

The use of statistical mechanics in the study of the coupling space for large N was initiated by Gardner ||^ for 
0^ random input -output relations. Derrida et al. suggested calculating the cell size distribution, which could be 
done only two years later when Monasson and O'Kane Q introduced a modification of the standard replica trick 
in connection with multifractal techniques. Now there are several applications for perceptrons |^-^ and multilayer 
networks [|jlO|. 

All these calculations consider the case where a uniquely determined output is perfectly learned by the student 
^ ' network. However, there is often no need or no possibility of perfect learning a special classification, or in real 
Q applications only noisy output data are available. Introducing an error rate corresponds to collecting several cells 
O (H) into clusters. In the present paper we use the multifractal approach to characterize the coupling space structure 
' of the output representations in these clusters. This analysis allows us to observe various storage and generalization 
problems within one approach. We include both the case of a student who perfectly learns some incorrect data (e.g. 
generalization with output noise) as well as the case of a student who tries to learn a well-defined task only with a 
certain error rate (e.g. storage with minimal error above the storage capacity). 

The outline of the paper is as follows. In Sec. II we present the multifractal formalism for neural networks. Sec. 
Ill contains the general calculations for the internal representations of the cell clusters. In Sec. IV and V the most 
interesting cases are analyzed in detail, i.e. the storage and the generalization problems with noise. In Sec. VI we 
briefly comment on the spatial distribution of the cells. A summary is given in the final section. 
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II. THE MULTIFRACTAL FORMALISM 



Due to the geometrical nature of our problem a multifractal method is the appropriate one. In this section we 
introduce the multifractal formalism as applied to perceptrons. In order to clarify the notation we review some results 
obtained in for spherical couplings without going into the subtleties of the approach. 

We choose p = 'jN input patterns ^'^ G 1}^7 ~ Ij ■ • ■ iPi with entries randomly drawn from the distribution 
-P(Cf) — 1/2 ^('Ci' + l) + l/2 (^(ff — !)• The hyperplane perpendicular to each ^'^ cuts the coupling space into two parts. 
The patterns therefore generate a random partition of the coupling space into cells defined by (^) and labeled by their 
output sequences cr. The relative cell size P{cr) — V{cr)/ J2t ^(''") describes the probability of generating the output 
cr for a given input sequence with a coupling vector J chosen at random from a uniform distribution over the 
whole coupling sphere. In the thermodynamic limit they are expected to scale exponentially with iV, consequently 
we characterize the cell sizes by the crowding index a{cr) defined by 

P(cr) = 2-^"^^) . (3) 

The storage and generalization properties of the perceptron are coded in the distribution of cell sizes defined by 

/(«) = lim ^ log2 E '^(" " "('^)) ■ (4) 
cr 

In the language of multifractals this quantity is called the multifractal spectrum. To calculate it within the framework 
of statistical mechanics one uses the formal analogy of f{a) with the micro-canonical entropy of the spin system cr 
with Hamiltonian Na{cr). It can hence be determined from the corresponding 'free energy' 



r{p) = hm 1 log2 E 2-P~"('^) = - hm 1 log^ ^ ^'('^) (5) 
cr cr 

via Legendre-transformation with respect to the "inverse temperature" p 



f{a) = min[ap — t{p)] . (6) 
p 

In the multifractal terminology t{p) is called the mass exponent. 

To explicitly calculate this quantity for the perceptron we start with the definition of the cell size 

PH= / d^^iJ)f[9i^a^J■e) (7) 

using the Heaviside step function 9{x). The integral measure 

d^,{3)^l[^S{N-3'). (8) 

V27re 

ensures both the spherical constraint for the coupling vectors as well as the total normalization P{cr) = 1- 

In the thermodynamic limit N —i- oo we expect both r and / to become self-averaging, and we can therefore 
calculate the mass exponent (H) by using the replica trick introducing n identical replicas numbered a = 1, . . . ,n to 
perform the average over the quenched patterns. Moreover, we introduce a second replica index a = 1, . . . ,p in order 
to represent the p-th power of P in (^). Using an integral representation for the Heaviside function we arrive at a 
replicated partition function given by 

exphE^r(^r-^Ja-^'')|))- (9) 
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As usual, the average ((•)) over the quenched patterns can be performed, and the overlaps 



- —J" • 3^ 



(10) 



are introduced as order parameters. The spherical constraint restricts the diagonal elements of this matrix to one. 

It is important to note that the output sequences {cr°} carry only one replica index. The typical overlap of two 
coupling vectors within one cell (same output sequence {cr"}) will hence in general be different from the typical 
overlap between two coupling vectors belonging to different cells (different output sequence {cr^}). Therefore we have 
to introduce already within the replica symmetric approximation two different overlap values: 



1 if (a, a) = {b,(3) 
P ii a = b, /3 



(11) 



In accordance with the above discussion P then denotes the typical overlap within one cell, whereas Pq denotes the 
overlap between different cells. 

Plugging this RS ansatz into (||) one realizes that Pq — always solves the saddle point equations for Pq. This 
has an obvious physical interpretation: Due to the symmetry of (^ and therefore of the crowding index under the 
transformation (J, cr) <-> (—J, ~cr) every cell has a "mirror cell" of same size and shape on the "opposite side" of the 
coupling space. Pg = simply reflects this symmetry. 

Finally we obtain the mass exponent 



t{p) 



log 2 



extrp 



i log(l + {p- l)P) - ^ log(l - P) - 7 log 2 / DtHP I J ^^t 



(12) 



where we introduced the abbreviations Dt = dt exp(— t^/2)/A/27r for the Gaussian measure and H{x) = Dt. The 
order parameter P is self-consistently determined by the saddle point equation 
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/j^,ff.-2(,y^,)exp{-^t^} 



1 + ip- 1)P 2tt 



jDtHp[^: 



(13) 



This equation can only be solved numerically, the results are shown in fig. 1. 
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FIG. 1. Multifractal spectrum f{a) characterizing the ceU structure of the couphng space of the spherical perceptron for 
various values of the loading parameter 7 = 0.2, 0.35, 0.5, 1.0, 2.0 (from left to right). 



The total number of cells is given by 

Af= I da2^-'^(") (14) 



and is therefore exponentially dominated by cells of size Q!o(7) = argmax(/(a)). Because of ^ = p this point is 
reached at p = 0. The random choice of any output sequence will hence lead with probability one to a cell of size 
00(7), and 2~^"''(''') is found to be the Gardner volume. From 00(7 — > 2) ^ 00 we find the storage capacity to be 
7c = 2 as in 0]. For 7 < 2 the problem of storing 7iV random input-output pairs is realizable with probability one. 
So we have A/ = 2'''^~''(^) and therefore /(ao(7)) = 7 in the thermodynamic limit. 

Although the cells with volume ao are the most frequent ones, their joint contribution to the total volume of the 
sphere is negligible. Since 

/•oo 

1 = Vp((t)=/ (ia2^[/(")-"l (15) 
a- -^0 

a saddle point argument reveals that the cells with size 011(7) defined by ^(cui) = 1 dominate the volume. This point 
is given hy p = 1. Cells of larger size are too rare, those more frequent are too small to compete. Consequently a 
randomly chosen coupling vector J will with probability one belong to a cell of size ai . By the definition (^) of the 
cells all other couplings of this cell will give the same output for all patterns Therefore 2~^°'^'-'''> is nothing but 
the volume of the version space of a teacher perceptron chosen at random from a uniform probability distribution on 
the sphere of possible perceptrons. From it (or equivalently from P{p — 1,7)) one can determine the generalization 
error as a function of the training set size 7, thus reproducing the results of |11[| . 



III. INTERNAL CELL STRUCTURE OF CLUSTERS FOR NOISY OUTPUT DATA 

In order to include noisy output data we have to slightly modify this procedure. As before, we consider a randomly 
drawn set of input patterns {^^; fj, = 1, jN} as quenched disorder. The global cell distribution consequently equals 
the one in the previous section. 

Now we take any output sequence s g {—1, 1}'''^, demanding it to be learned with an error rate S £ (0, 0.5). S = 
corresponds to the noise-free case, S = 0.5 to outputs which are totally uncorrelated to the original pattern s. The 
realized output cr e { — 1, 1}^^ has an overlap 

— ^ a-.- ^1-25 (16) 

with s. The set of all cells C(cr) with this output overlap forms a cluster. It is the internal structure of the cluster 
which we will analyze, i.e. we calculate the internal cell spectrum of the cluster. The restricted partition function can 
be written as 

Z{s, S)^Y. ^(Z^ E ''''' - 1 + 25)^^'^) > (17) 
cr ^ fj.=l 

where the relative volume P(<t) is defined by (^. In the special case of a randomly drawn sequence s this quantity 
is closely related to the partition function considered in ||l^ where a Gibbs measure of the error rate was introduced 
instead of the S- function in (p7|). Being self-averaging Z(s, S) does not depend on s itself, but only on the size a(s) 
of the central cell. It can therefore be characterized by the real number p with a{s) = ap, ^(op) = P, in the global 
spectrum. The mass exponent of the cluster is thus given by 

riq\p,S) = - ^^J^) • (1^) 

This is in complete analogy to the standard calculation of canonical expectation values in statistical mechanics. A 
very similar method was introduced in |13[ in order to characterize metastable states in spherical p-spin glasses. In 
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that case one system was thermalized in an equilibrium state, whereas a second one was restricted to have a certain 
overlap with the first one - which is analogous to our output sequences s and cr. 

The internal spectrum f{a\p, 6) of the cluster can again be calculated by a Legendre transformation with respect 
to the 'inverse temperature' q, cf. (^). 

Before explicitly performing the technical part of the analysis we want to clarify the question which problems can 
be solved within our approach. Clearly, the value of p fixes the original learning task without noise, which corresponds 
to perfect learning of the output sequence s. As already discussed in Sec. II, p = 0,1 are of particular importance for 
storage and generalization problems. Now, q = describes the most frequent cell within the cluster. If we take any 
random output string cr having overlap 1 — 2S with s, we will arrive with probability one in a cell of size a{q = 0\p, S). 
This point corresponds therefore to a student who perfectly learns one particular incorrect output sequence. For the 
generalization problem, p = 1, it gives the behavior in the presence of output noise. 

On the other hand, q = 1 characterizes the volume-dominating cells of the cluster, the total crowding index of the 
cluster is given by 

a,iip, S) = a{q = l\p, S) - fia{q = l\p, S)\p, 6) (19) 

It describes the volume of the version space of a lazy student who is satisfied whenever he finds a coupling vector 
producing errors with maximal rate S. 

From the spectra for different p but fixed S we can get some information on the spatial distribution of the cells 
- whether there are more large/small cells in the environment of another large/small cell. This can be read off the 
p-dependence of the internal cluster spectrum for one value of d. 

In order to answer all these questions we have to calculate the mass exponent (|l^). We need to introduce four 
replications as representation of: 

(i) the logarithm of the partition function: a = 1, n, 

(ii) the power q in the partition function: a = 1, g, 

(iii) the fraction in the average over all p-cells: fc = 1, m, 

(iv) the power p in the average over all p-cells: k = 1, ...,p. 

The replicated and averaged partition function consequently reads 



-I + 26))) 



(20) 



The coupling vectors are elements of the p-cells, K'f of the central cell of the cluster. J" lies within the cluster 
cells. Using this, the mass exponent can be determined from the replica trick 

T{q\p, S) = - lim I lim 9„Z™,„ . (21) 

The calculation of Zm,n widely follows standard routes and uses the order parameters 



= • , Vfc, ; = 1, m; ^, A = 1, , 



Qaf-^J^J^ Va,6=l,...,n;a,/3=l,...,g (22) 

r,K,a _ 1 -r^K TO 
^k,a — ]y ■ ''a 

for the overlaps of coupling vectors from p-cells and from g-cells of the cluster. The diagonal elements of the matrices 
Q and P are restricted to one by the spherical constraint. This leads after standard manipulations to 

n dp;:f [ n '^Qaf f n ^^^m/ii^^ 

(/c,K)<(i,A) {a,a}<(b,l3) (fe, 

xexp|^lndet(^^, ^ ^ + i7iV(l - 2^) E 



(fe,K) " (a,Q) (a, a) Sfc,CT„ 
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X exp ■ 



a,a..b,j3 



-)a..l3 



.,b ^a^b 



Vl 



k.K.l.X 



a, a /c,K a 



7Ar 



(23) 



where Fa was introduced to fix the overlap of Si and era to 1 — 2S. 

The determinant can be represented by a Gaussian integral having the same exponent like the quadratic part of 
the second exponent in (E3h. By transforming the integration variable according to 



Vk^Vk 



LX,i 



we obtain 



In det 



P R 

R* Q 



= In det P + In dct(Q - A) 



(24) 



(25) 



with A"'^ = ^ I ^ -^k'a (-^ ^)k't-^f 'b ■ '^^'^ same transformation can be made in the second exponent in (^). We 
analyse the resulting expression using the replica symmetric ansatz: 



pK, A 
^k,l 




{k,K) 
k ^ I 


= (/,A) 


^a,b 


= \ Qi 
[ Qo 


(a, a) 
a ^ b, 
a ^ b 




^k.a 


-{^ 


k = 1 
k ^ 1 




iFa 


= F . 







(26) 



P describes the typical overlap within one p-cell of the global spectrum therefore fulfilling the saddle-point equation 
(|l|) from sec. II. Qi gives the overlap of two arbitrary couplings from the same g-cell inside the cluster, Qo the 
overlap between two of these cells. Due to the fixed overlap of the cluster output with the output of the central cell, 
the mirror symmetry (J, cr) i— s- (—J, —cr) is explicitly broken, we therefore expect Qo to be different from zero. R is 
the overlap of the cluster cells with the central p-cell, whereas the overlap of a cell from the cluster with a randomly 
chosen p-cell is again zero for symmetry reasons. 

Finally we get the replica symmetric mass exponent by taking the 0(n)-terms for m = 0, 



1 

'j^extrQ,,,fl^F 



^ ln(l - Qi) + I ln(l + - DQr - .Qo) + f TTrH^Tf^ 

2 2 2 1 + (gr - l)Qi - grQo 



7(1 - 25)F 



Jo 



y/27T{l-P) 



exp{-ii^-j^^ - icp} 



jDulnJ Dw{e^Hl + e-^i/|) 



(27) 



with 



H±=H\± 



VQi - Qow + a / Qo - T 



-(p-i)p' 



i+(p-i)p^ 



Vi^QT 



(28) 



P is given by (|T^) for p. Because of the integrals over complex-valued functions the general case is hard to handle 
numerically, and we concentrate on the most important cases p — 0,1, i.e. the storage and generalization problems. 
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IV. STORAGE WITH ERRORS 



This section we focus on the storage problem with noisy output data, i.e. the case of a central cell with p — 0. 
Inserting this into (|2^) we can eliminate the integrals over complex-valued functions and find 



T{q\0,6) = -j^cxtTQg.^F 



where H± simplifies to 



2 

+7 



- ln(l -Qi) + l ln(l + {q- l)Qi - qQo) 



J Duln J Dw{e^Hl+e-^Hl) 



Qo 



21 + iq-l)Qi-qQa 



H+=H ± 



VQi - Qqw + \fQn^ 



7(1 - 25)F 

(29) 

(30) 



The dependences on R and P vanish, leading to only three saddle point equations for the order parameters Qo,Qi, 
and F: 



= 1 - 2(5 - D 



= 



J DwiePHl+e-PHl) 
Qq 7 



(l + ((?-l)Qi-gQo)2 27r(l-Oi) 



Du 



(31) 



Qi — Qo 



+ 



Mi-Qi) 



I + [q - l)Ql - qQo [l + [q - l)Qi - qQo)^ 



2_ 

2tt 



Du 



ID 



wle 



Hr')exp{- 



(VQi-Qot"+%/Qo")^ 
i-Qi 



Before solving these equations numerically, we discuss some intuitively clear and also analytically tractable limiting 
cases. For S = 0.5 half the output bits are flipped and there is no remaining correlation between the original output 
sequence s and the sequence cr to be learned. Up to terms irrelevant in the limit of large N we obtain at most 
(0 ItJv) — ^''^^ possible cells, the spectrum equals hence the global one described in Sec. II. From the first saddle 
point equation we calculate F = 0, from the second follows Qq = 0. The third equation together with ( p9| ) confirm 
our expectation. 

For S = both sequences s and cr coincide up to a non-extensive fraction of bits. The cluster thus shrinks towards 
its central cell, which has the Gardner volume. The cluster spectrum shrinks to a single point at ao (as defined in 
Sec. II) and / = 0. In the saddle point equations ( ^ we find this result for F —00 and Qo — Qi — P{p = 0) 
fulfilling @j with p = 0. 

For q = we obtain for every fixed S the storage problem with an output sequence produced by flipping d^N bits 
randomly chosen from a randomly drawn sequence of length 7 A^. The resulting output sequence cr is consequently also 
a random sequence of independent and unbiased bits. The learning problem is obviously equivalent to the standard 
Gardner problem. This is confirmed by a{q — 0|0, (5) = ao,V(5, whereas the total number of these cells is (^T^^) 
resulting in /(0|0, (5) — — 7((51og2 (5 — (1 — (5) log2(l — 5)) in the thermodynamic limit. 
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FIG. 2. Multifractal spectrum f{a) of clusters around p-cells with p = 0, 1 (solid/dashed lines) for 7 = 0.2 and 
S = 0.5,0.3,0.2,0.1 (from top). The diamonds mark the crowding indices of the central cells, they coincide with the spec- 
tra for (5 = 0. 



The rest of the spectrum has to be analyzed numerically, a typical set of /(a)-curves is shown in fig. 2. The most 
interesting point is - besides g = as discussed above - given by g = 1. The total volume of the cluster is given by its 
crowding index aci{6) = a{q — 1|0, 6) — f{a{q = 1|0, S)). By calculating the storage capacity 7c((5) for fixed error rate 
S from the divergence of aci we reproduce the replica symmetric results of [^j which Gardner and Derrida calculated 
for the minimal error rate above 7 = 2. So at least at that point, replica symmetry breaking effects should be taken 
into account in the ansatz for the cluster overlap Q. However, due to the complexity of even the replica symmetric 
calculation we refrain from doing this. 

We still have to remark that the continuation of the mass exponent to negative q is somewhat subtle. This can 
be expected already by considering the definition of the restricted partition function Z{s,S). Whenever there 
are empty cells ^ (nT i) diverges for every q < 0, leading to T{q < 0\p, S) = —00 because of the average over all input 
realizations in (|l 8])^^ Without any change of the results for positive q we can regularize r by summing only over those 
(T having a non vanishing relative cell volume P(cr) - describing the well-defined multifractal spectrum also for q < 
via a Legendre transformation. 

We consider now the last integral in ( p9|) in the case of negative q. Because of H{w) oc exp(— w^/2)/V27rii; for large 
w we get an asymptotic exponential part of the last integrand which is proportional to ea:p(— Au'^/2 + 0{w)) with 



A 



l + (g-l)gi-ggo 
1-Qi 



(32) 



The integral consequently diverges for A > 0, i.e. for every < Qq < 1 at Qi = (1 — g QQ) /(l — q), and the 
global minimum in ( |29| ) with respect to Qi is no longer given by the saddle point equations (pi]). Due to this the 
mass exponent would be expected to diverge to —00 for every g < 0. On the other hand, the continuation of the 
saddle point equations (31) to g < gives smooth results for the mass exponent and the multifractal spectrum. We 
expect it therefore to describe the correct regularization of the partition function at least within the replica symmetric 
approximation. 
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V. GENERALIZATION WITH ERRORS 



In this section we treat the question of generalizing noisy output data. As mentioned in Sec. II, this problem 
corresponds to taking p = 1. Also in this case the complex-valued integrals can be evaluated analytically. The mass 
exponent is given by 



+27 / DuH 



uR 



VQo-R\ 



In J Dw{e^Hl + e'^ Hi) 



with 



vQT_Qow_j_\/Qau 



+ 7(1 -2(5)i^ 
(33) 

(34) 



Again, the dependence on P vanishes whereas R remains an order parameter to be optimized. We obtain four saddle 
point equations which determine Qq, Qi, R, and F: 



= 1-26-2 DuH 



uR 



\ ^' Dw{ePH'L-e-PHX 
^Qo-R^ jDwie^Hl+e-^H^, 



= 



Qq — 



{I + {q - l)Qi - qQof 7r(l-Qi) 



DuH 



iR 



jDw{e^Hl-^ 



-^Hl-')eM- 



2(1-Ql) 



J Dw{eFHl+e-FHl] 



(35) 



= 




(go-fl")(i-Qi) 

qQo {I + {q - l)Qi - qQo)^ 
J Dw{e^Hl-^ + e- 



^Hl-') exp{-i^^i^^^g^^} 



J Dwie^Hl 



Hi) 







qR 



l + {q-l)Qi-qQo TT 



du e 2(Qo-R^) 



Qou 



iQo - i?2)3/2 



In / Dw{e^H'L 



■Hi 



Several intuitively clear limiting cases can be discussed analytically. As argued in the previous section, for 5 = 0.5 
we recover the full spectrum with order parameters F = 0, R = 0,Qo = and Qi = P{q)- R is the overlap of the 
central cell with the cells of the cluster. Its value is found to be zero for all q indicating that all types of cluster 
cells are orthogonal to the teacher vector, their volumes are dominated by the part lying on the {N — l)-dimensional 
"equator". The learning of a vector obtained by flipping half of the teacher's outputs is obviously equivalent to the 
storage problem of a random output sequence. The student is not able to get any information about the teachers 
rule. 

For 6^0 the cell cluster shrinks towards the central cell, which is the version space of the corresponding noise- 
free generalization problem. F diverges to — c» whereas the other three order parameters coincide asymptotically, 
Qo = Qi = R — P{p = !)■ The crowding index takes only the value ai, f(q\l, 0) is found to be zero. The equivalence 
of this solution to earlier results of |jll| was already discussed in 

For general 6 the analysis has to be done numerically. In fig. 1 we show a representative set of spectra for several 
values of S. For growing error rate not only the number of cells in the cluster increases, but also the range of different 
existent cell sizes. 

For g = we obtain a student who perfectly learns an output sequence generated by a teacher, but flipped with 
rate S. This corresponds to the case of output noise analyzed in |ll,|lj]. The cell sizes go for < (5 < 0.5 from ai 
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to ao, thus interpolating between the noiseless learning from examples and the storage problem for random input- 
output relations. This interpretation gives also a sense to the part of the global cell spectrum for inverse temperatures 
between zero and one, at least a proper subset of these can be understood as generalization tasks including noisy 
output data. Of course, for 5 > this task is not learnable for large loading ratios 7. This observation leads directly 
to a storage capacity jc{S) going monotonously from 7c(0) = cx3 to the Gardner value 7c(<5 = 0) = 2. For every 5 the 
overlap R between teacher and student is a monotonously increasing function of the loading ratio. Its maximal value 
is reached at RmaxiS) = R^jdS)) which remains strictly smaller then one for every (5^0. 




FIG. 3. Overlaps Qi,R,Qo (from top to bottom) for the generalization problem with a student making up to 0.l7A^ (full 
lines) and O.IA^ (dashed lines) errors 

Another problem can be analyzed in the spectrum at g = 1. The total volume of the cell cluster surrounding a cell of 
size ai is given by its total crowding index which can again be calculated from aci{S) = a{q = 1|1, S) — f{a{q = 1|1, S)). 
This learning task corresponds to a lazy student being satisfied with any output having at most SjN errors compared 
with the sequence of examples presented by the teacher, cf. The student can achieve this for every value of 7, an 
upper threshold for the loading ratio does not exist. As illustrated by the full lines in fig. 3, in the case of a fixed error 
rate ^ > the overlap R between teacher and student does not go to one, and the generalization error e = — arccosi? 
does not tend to zero for increasing loading ratio 7. The cell volume of every special output realization shrinks to zero, 
Qi — > 1, but this is compensated by a cell number exponentially growing with 7iV. Thus, the resulting total cluster 
volume does not vanish, Qq < 1. If we fix instead the total number of errors, the number of possible representations 
does not depend on 7 either. The vanishing version space volume of every particular output sequence thus results in 
a vanishing total volume, leading to a vanishing generalization error in the limit of large loadings 7, cf. the dashed 
lines in figure 3. In both cases, the information gain [Q dad/dj goes from values of order one (halving the cell with 
every new pattern) for small 7 to zero for 7 ^ 00. 

The inclusion simultaneous noise for teacher and student requires the introduction of 6 different replications resulting 
in an even more complex structure of the order parameter equations. Therefore we refrain from doing it. 



VI. ON THE SPATIAL CELL DISTRIBUTION 



From the spectrum of the internal representations in a cluster we can also get some information on the spatial 
distribution of the cells. If the latter were totally random, we would not expect any dependence of the internal cluster 
spectrum on the central cell, i.e. on p. In this case, cells of all possible sizes should be contained in the cluster. From 
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fig. 1, where the spectra are plotted for p = 0, 1, wc can deduce that the distribution has some structure. Reducing 
6 from 0.5 does not only decrease the number of cells, but also of the range of different cells. Both very large as well 
as very small cells are excluded. 

In the neighborhood of (5 = the spectrum is concentrated in a small interval around the crowding index ap of the 
central cell. This means that every cell is surrounded by cells having almost the same size leading to some kind of 
clustering of cells of nearly equal size. So there appear in the neighborhood of very large cells no very small cells and 
vice versa. Of course, due to the symmetry of the probability distributions for the input patterns, these "clusters" of 
nearly equally sized cells are isotropically located in coupling space. 

VII. SUMMARY 

In the present paper we have analyzed the internal structure of cell clusters having a given output overlap with 
a certain central cell. The calculation of the internal multifractal spectrum of such clusters allowed us to discuss 
various storage and generalization problems of noisy output data within one single unified approach. The analysis 
included both the case of a lazy student which is satisfied whenever he achieves some maximal error rate, as well as 
the case of an absent-minded teacher offering incorrect data to his student. In the global cell spectrum of the whole 
coupling space it was not possible to give an interpretation to cells of crowding indices in-between the Gardner value 
ao and the generalization value ai. As a result of the present approach we are able to understand at least a proper 
subset of these cells as related to generalization tasks with output noise. Additionally we have shown that every cell 
is surrounded by cells having nearly the same size. The range of realized sizes is increasing with decreasing overlap 
of the output sequences labeling the cells. 

We are aware of the fact that the multifractal approach is plagued by the existence of replica symmetry breaking, 
but due to the technical difficulties of a calculation which includes four different kinds of replicas we restricted our 
analysis to the replica symmetric ansatz. The inclusion of replica symmetry breaking effects would surely change 
some of the calculated quantities, but the qualitative picture would probably remain unchanged. 

Acknowledgments: Many thanks to A. Engel and J. Berg for illuminating discussions and for careful reading the 
manuscript. 
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